Fast gridding of irregular data

ABSTRACT

A method of fast gridding of irregular data, has been developed for spatial interpolation of large irregular spatial point data sets; for example building a 3D geographic terrain grid surface from billions of irregularly spaced xyz coordinates on the earth&#39;s surface. The method developed typically translates into many orders of magnitude gain in computational speed. For example, to produce a gridded data set (having M rows and N columns) from P irregularly located sampling points, the computational steps required can be reduced from a number of the order of O(M×N×P) to a lesser number of the order of O(M×N+P) operations. The method achieves this by ensuring that each of the P sampling points is visited only once. This is particularly significant since spatial data collection devices typically collect data points in the billions. The method described is readily extendible to any number of dimensions.

CROSS REFERENCE TO RELATED APPLICATIONS

This non provisional application relates to a Provisional Application:

Title: Fast Gridding of Irregular Data

U.S. Application No. 60/767,191

EFS ID: 104877

Filing Date: 9 Mar. 2006

the entirety of which is incorporated herein by reference.

BACKGROUND OF THE INVENTION

Introduction

The gradual accumulation of spatial data in all fields of science aswell as the increasing capacity of data acquisition equipment, such asLIDAR (Light Distancing And Ranging) for airborne topographic mapping,is giving rise to an exponentially growing volume of information. Thisgrowth exceeds Moore's law doubling computation capacity every 18months. The common approach, when carrying out analyses on computingplatforms, is to throw more computational resources at the problem. Thisapproach is reaching its scalability limit and, once again, thedevelopment of smart and efficient algorithms is becoming paramount forpractical processing gargantuan data sets.

Spatial analysis based on regular rectangular grids is one method ofimproving the efficiency of data processing and modeling. The regulargrid affords innumerable advantages in statistical analysis, frequencydomain analysis, linear or non-linear modeling and so on. In addition,the regularity of rectangular grids allows the use of hardware assistedvector processing techniques which further leverage Moore's law.

Problem Statement

Most collected data is sampled in irregular or semi regular patterns;therefore an important step in data processing is the marshalling ofdata into a uniform spatial grid. This is done by generating a grid G ofM by N cells from P distributed sampling points characterized by theirvalue Z_(p), and their coordinates X_(p) and Y_(p). The prototypicalgridding method uses the Inverse Distance Weighing (IDW) method tointerpolate a value at each grid point involving the followingcomputation for each grid cell G_(n,m).

$G_{n,m} = {\sum\limits_{p = 1}^{P}{{V_{p}/\sqrt{\left( {X_{p} - {m\;\Delta\; x}} \right)^{2} + \left( {Y_{p} - {n\;\Delta\; y}} \right)^{2}}}/{\sum\limits_{p = 1}^{P}{1/\sqrt{\left( {X_{p} - {m\;\Delta\; x}} \right)^{2} + \left( {Y_{p} - {n\;\Delta\; y}} \right)^{2}}}}}}$

This requires P compound operations for each of M×N cells for a total ofM×N×P compound operations. Other techniques, such as krigging,triangulated irregular network (TIN) and spline interpolation, alsoinvolve the sequential revisiting of each of the P sampling points foreach of the M×N grid points.

If the grid is to approximate the spatial resolution of the originaldata set one would expect the number of grid points M×N to beproportionate to the number of sampling points P. This yields thatprocessing power needed from gridding a data set grows with P². Clearlysuch brute force gridding techniques become untenable with very largedata sets.

Two factors affect the scalability of large data set analysis:processing power (i.e. number of operations required) and memory usage(i.e. number and size of storage units). Often one of these features isnormally compromised to optimize the other. A modest geographic data setby today's standard might be a digital terrain model of an urban areahaving point samples at an average resolution of In over an area of 10km×10 km (100 km2). This is 100,000,000 data points with an X, Y and Zvalue which if represented in double precision translate to 2.3 Gb ofinformation.

The popular LiDAR (Light Distancing and Ranging) technology now commonlydelivers as many as 1 to 2 billion X, Y, Z coordinates in a singlesurveyed region. This represents 20 to 40 Gb of raw information. Thiswill stress the memory capacity of many current computer systemsespecially if more fields are needed for the storage of intermediateresults.

With processing requirements of the order of O(P²) and assuming only oneoperation per point, current hardware requires hundreds of hours ofcalculations. This clearly places interactive processing out of reachand reduces the feasibility of exploratory analysis, that is, performingwhat if scenarios to aid in decision making.

BRIEF SUMMARY OF THE INVENTION

A method of fast gridding of irregular data, has been developed forspatial interpolation of large irregular spatial point data sets; forexample building a 3D geographic terrain grid surface from billions ofirregularly spaced xyz coordinates on the earth's surface. The methoddeveloped typically translates into many orders of magnitude gain incomputational speed. For example, to produce a gridded data set (havingM rows and N columns) from P irregularly located sampling points, thecomputational steps required can be reduced from a number of the orderof O(M×N×P) to a lesser number of the order of O(M×N+P) operations. Themethod achieves this by ensuring that each of the P sampling points isvisited only once. This is particularly significant since spatial datacollection devices typically collect data points in the billions. Themethod described is readily extendible to any number of dimensions.

BRIEF DESCRIPTION OF THE DRAWING

FIG. 1 is a schematic representation of an exemplary embodiment of amethod for fast gridding of irregular data.

DETAILED DESCRIPTION OF THE INVENTION

Single Pass Point Processing

In an exemplary embodiment of a fast gridding method 100, when buildinga surface grid (having M rows and N columns) from P irregularly locatedsampling points, the fast gridding method 100 reduces the processingpower required for data gridding to the order of O(P). This isaccomplished through an algorithm where each individual point, p, isvisited only once 102. For each such point, corresponding grid indicesare easily computed by taking the modulus of the point coordinates 104.This essentially reverses the traditional processing order and takesimmediate advantage of the regularity of the grid. Instead of scanningeach grid point and then searching though all the samples to find amatch, all the data points are scanned and, through simple arithmetic,the indices of the corresponding grid points are found.

In many instances, several sampling points may be matched to the samegrid index and some means of estimating a value at the grid point frommultiple nearby samples is necessary. To accomplish this, two arrays areformed, that are initialized to zero, and for each sample point thefollowing is assigned 108: SUM_(m, n), where the contribution of eachsample assigned to (m, n) is cumulated; COUNT_(m, n), which isincremented each time the a sample is assigned to (m, n).

Once all P sampling points are processed, the grid can be scanned onceto calculate the grid value Z_(m,n) for each grid cell 110. Theestimation of Z_(m,n) is an O(M×N) process using standard techniquessuch as averaging:Z _(m,n)=SUM_(m,n)/COUNT_(m,n)

With appropriately selected M and N the resulting grid will have aspatial resolution comparable to the original data but since thecoordinates X_(m, n), and Y_(m, n) do not need to be stored explicitly,the storage requirements are reduced by 66%. Note that the value of Zcan be stored at the same location as SUM, so that COUNT, which in mostcases is a single byte array, is the only temporary storage required.

The method 100, where for each sample point a single grid location (m,n) is computed, is likely to leave unassigned values of Z_(m,n), thefilling of which will be discussed later.

An alternative exemplary embodiment of the method 100 extends further byallowing the assignment of one sample point to several grid pointsdistributed in m_(i)×n_(i) patterns about the sample point, wherem_(i)×n_(i) defines a small region of influence or footprint for eachsample point 106. The number of operations required by the fast griddingmethod 100 is then proportional to P×m_(i)×n_(i), however, m_(i)×n_(i)is for all practical purposes fixed or selected independently of thenumber of sampling points, thus computational load remains O(P). Thecase of assigning each sample point to m_(i)×n_(i) grid points willresult in a smoothed field, equivalent to having passed the gridded datathrough an m_(i),n_(i) moving-average filter. Spatial details will belost, but the number of unassigned blank spots will have been reduced.

In another alternative exemplary embodiment of the method 100, theresolution of the grid can be augmented a priori to m_(i)×M by n_(i)×N,each sample point assigned a foot print of m_(i)×n_(i) and finally thehigh resolution grid can be decimated back to M×N by simple averaging.At the cost of more temporary memory usage, the result retains theinitial spatial resolution but still provides some gap filling ability.

Interpolation

In an alternative exemplary embodiment of the method 100, other moreadvanced estimation techniques for Z_(m,n) can be incorporated based onthe interpolation of surrounding values using standard interpolationtechniques. The following shows how classical interpolation techniquescan be integrated at this stage 110 into the method 100 in such a waythat that the requirement for only one pass is still satisfied. Examplesinclude:

Inverse Distance Weighting (IDW), which has the advantage that anysample point very close to a grid point will have much more influence onits value, thus greatly reducing the footprint smoothing effect butretaining the gap filling ability. The resulting grid is very close towhat would be obtained by the traditionally applied IDW technique. TheIDW calculation is adapted as follows:

WSUM_(m, n), cumulates Z_(p)/D_(p,m,n) for each sample assigned to (n,m);

WCOUNT_(m, n), cumulates 1/D_(p,m,n) for each sample assigned to (n, m),

D_(p,m,n) is the distance between the sampling point p and the gridpoint (n, m).

Final estimates of the gridded values: Z_(m,n)=WSUM_(m,n)/WCOUNT_(m,n)

For most practical cases the IDW technique is recommended with (m_(i),n_(i))=(3, 3).

Closest estimate, in which case only the point closest to the grid pointis retained in order to assign a value to that grid point. This resultsin no smoothing and forces each grid value to agree exactly with thenearest sample. This technique is the fastest but, in regions of densesampling, it is more prone to aliasing high frequency signals into thegrid. Generally the resulting grid will contain more high frequencyenergy than with other techniques. The Closest estimate calculation foreach sample point, p, requires two arrays:

Z_(m,n) is assigned Z_(p) if D_(p,m,n)<MIN_(m,n) for each sampleassigned to (n, m);

MIN_(m, n) is assigned D_(p,m,n) if D_(p,m,n)<MIN_(m,n), where:D_(p,m,n) is as before,

MIN_(m,n) must be initialized to some large value >√{square root over((m_(i)Δx)²+(n_(i)Δy)²)}{square root over ((m_(i)Δx)²+(n_(i)Δy)²)}

After scanning all the sampling points, Z is obtained directly.

Minimum or Maximum estimates are especially useful when extremes of thesampling points are most relevant. This occurs in hydrography, forexample, where navigation maps must portray the shallowest depth ofwater available in an area to assure that ships do not run aground. Thistechnique purposefully produces a biased estimate Z, is very fast and ismemory efficient. For each sample point, p, the following simpleassignment is:

-   -   if Z_(p)<Z_(m,n) and a Minimum estimate is required then        Z_(m,n), is assigned Z_(p)    -   if Z_(p)>Z_(m,n) and a Maximum estimate is required then        Z_(m,n), is assigned Z_(p)    -   (Z_(m,n) must be initialized to some very large or very small        value, as appropriate)        Pre-Processing

The objective of the fast gridding method 100 is to produce a griddeddata set preserving as much as possible the information richness of theoriginal irregular sample data set. The following problems which arisefrom this objective can be solved in the pre-processing stage:

-   -   1. an appropriate resolution for the grid must be chosen;    -   2. points may need to be aggregated into portions small enough        to be processed.        Choosing an Appropriate Grid Resolution

At the time of sample data aggregation, the data must be scanned toobtain minimum and maximum X and Y coordinates which define a boundingrectangle for the grid. If the sampling points can be assumed to beuniformly distributed, a grid with the same number of cells as thenumber of sampling points should preserve the essential spatial featuresof the data.

Thus the resolution can be derived 112 from:D _(x) =D _(y) =Sqrt(((Max(X _(p))−Min(X _(p)))*(Max(Y _(p))−Min(Y_(p))))/P)/K

And the grid dimensions can be set 112 as:M=(Max(X _(p))−Min(X _(p)))/D _(x)N=(Max(Y _(p))−Min(Y _(p)))/D _(y)

By default K=1, but if the data set is particularly clumpy it then Kshould be set to a larger value to obtain a finer resolution. Converselya coarser resolution grid is obtained by setting K to a value between 0and 1.

Point Aggregation

Although the described gridding method 100 is relatively memoryefficient there are some data sets that are so large (i.e. P>P_(max))that it is not practical to keep the entire sampling data set or thecomplete grid in memory at once (because doing so can result in so muchvirtual memory page swapping as to completely degrade performance). Inan exemplary embodiment of the method 100, where the number of points isexceedingly large, the fast gridding method 100 pre-aggregates samplepoints into coarse tiles 114. This aggregation is very fast and requiresO(P) operations to simply assign the samples to a rectangularsub-domain. Often the input data is already organized in such fashioneither exactly or approximately.

After performing point aggregation, gridding can take place tile bytile. In the case where the region of influence is greater than onecell, i.e. (m_(i), n_(i)) is not equal to (1,1), up to nine tiles of thegridded data set must be kept in memory at once to ensure seamlesstransition across tile boundaries. This still greatly reduces the amountof memory required, and relieves scalability issues with respect to thesize of the sampling data sets.

To achieve further memory economy, the fast gridding method 100 canstore real variables in the form of more compact integers with fixedpoint representation. The calculations described above are performedwith floating point arithmetic but the result can be stored in fixedpoint representation. As gridding progresses through the tiled data set,any tile which is completed is immediately converted to fixed pointrepresentation and transcribed to permanent storage. Thus at most ninetiles need to kept in floating point representation at any one time.

Post-Processing

The nature of irregular data usually results in gaps in the surface gridthat is generated. Efficiencies in filling these gaps are automaticallyaccommodated by the fast gridding method 100.

-   1. Small gaps in the grid resulting from regions of low sampling    density are filled 116;-   2. Larger gaps in the grid can be filled where no data points exist    118;-   3. Finally, the accuracy of the resulting gridded surface in    reproducing the sample data surface must be assessed by calculating    the normalized deviation 120.    Gap Filling

The fast gridding method 100 optionally processes gaps in two passes:

In the first pass, any empty grid cell is assigned a value equal to theaverage of m_(j)×n_(j) non-empty surrounding cells. This will fill smallholes 116 with interpolated, smoothed values and in cases where thesample data points are relatively uniformly distributed will result in acomplete grid. In other cases large empty areas may remain.

In the second pass larger gaps can be filled in one of several ways 118:

-   -   1. assign an arbitrary distinctive value to unassigned regions        as an indication that no data is present;    -   2. compute the average value of the cells bounding the        unassigned region and assign this value to the empty cells, this        is appropriate for example to represent lakes;    -   3. use the IDW technique, or other interpolation techniques, to        interpolate the interior field from the values of cells bounding        the unassigned region, this produces a smooth surface that will        not over or undershoot the bounding region.

Beyond these automatic gap filling techniques, other modeling techniquescan be used to fill larger gaps in the post processing phase.

CONCLUSION

The ability of the fast gridding method 100 to efficiently produceregular gridded data sets from irregularly distributed sampling pointshas been described. The method 100 imposes an O(P) computational loadwhich is a vast improvement over more conventional methods and affordsorders of magnitude improvement in speed.

Although the method 100 has been described in the context of 2Dgeographical data, it is equally applicable in 3D such as for medicalimaging or in N dimensional problems such as may be encountered invarious domains of science. Similarly, the dependent variable Z can be areal number scalar, an integer, a vector or any other combination ofattributes.

The efficient and scalable gridding of irregular data is but one step inthe larger library of algorithms available for the modeling and analysisof huge regular data sets.

What is claimed is:
 1. A computer implemented method for producing agridded data set, having M rows by N columns of grid, from a number ofirregularly located P sampling points in a very large data set used formodeling and analysis of spatial data and stored in computer memory, themethod comprising: computing for each sampling point P, correspondinggrid indices in an M by N grid pattern by taking a modulus ofcoordinates associated with each sampling point; matching each of the Psampling points to a small rectangular pattern of m_(i) rows by n_(i)columns, where m_(i) and n_(i) define a small region of influence,selected independently of the number of sampling points, distributedaround each sample point P; accumulating grid index values in a first Mrow by N column computer memory storage array for each P sampling point;incrementing counters in a second M row by N column computer memorystorage array where multiple P sampling points are matched to the samegrid index; scanning the grid a single time, once all P sampling pointshave been processed, and calculating, for each grid index, a grid valuebased on the interpolation of the grid index values; and storing thecalculated grid value of the first array to the computer memory storage.2. The method of claim 1, wherein prior to computing, the method furthercomprising pre-processing the sampling points, wherein the samplingpoints are aggregated by assigning samples to a rectangular sub-domainof coarse tiles which then undergo the gridding computation tile bytile.
 3. The method of claim 1, further comprising post processing thecalculated grid values by further calculating, for each grid index whichhas not been assigned a grid value, an interpolated grid value, based onthe grid value of those of its immediately neighboring grid indiceswhich have already been assigned a grid value.
 4. The method of claim 1further comprising post processing the calculated grid values by furthercalculating, for each region of adjacent grid indices which have notbeen assigned a grid value, an interpolated grid value, based on thegrid values of the perimeter grid indices bounding each such region. 5.The methods of claim 1 wherein the gridded data set has three or moredimensions; wherein the three dimensional gridded data set has rows,columns and layers of grid indices which comprises a computer memorystorage array; wherein the method of claim 1 is performed for eachlayer.
 6. The methods of claim 1 wherein each grid index in the griddeddata set has two or more associated grid values.
 7. The method of claim1 wherein interpolating the calculated grid values is performed byaveraging the grid values wherein each value in the first array isdivided by the corresponding value in the second array.
 8. The method ofclaim 1 wherein interpolating the calculated grid values is performed byan inverse distance weighting of values in the first array and secondarray to calculate the grid value based on surrounding calculated gridvalues.
 9. The method of claim 1 wherein interpolating the calculatedgrid values is performed by determining a closest grid value to thesampling point P grid point to assign a grid value to the grid point.10. The method of claim 1 wherein interpolating the calculated gridvalues is performed by determining maximum or minimum estimates tocalculate the grid value where extremes of the sampling points are mostrelevant to a type of the data set.
 11. The method of claim 3 whereinthe interpolated grid value determined during post processing isinterpolated by one of: assigning an arbitrary distinctive grid value tounassigned grid indices as an indication that there is not data present;computing an average grid value of cells bounding each unassigned gridindex and assigning this grid value to empty cells; and filling holes inthe gridded data set by performing an inverse distance weighting tointerpolate the interior field from the grid values of cells boundingthe hole.
 12. The method of claim 4 wherein the interpolated valuedetermined during post processing is interpolated by one of: averagingsurrounding grid values; assigning an arbitrary distinctive value tounassigned regions as an indication that there is not data present;computing an average grid value of cells bounding each unassigned regionand assigning this grid value to empty cells; and filling holes in thegridded data set by performing an inverse distance weighting tointerpolate the interior field from the grid values of cells boundingthe hole.
 13. The method of claim 1 wherein the values stored in thefirst and second arrays are stored in the form of compact integers withfixed point representations.
 14. The method of claim 1 wherein storingthe calculated grid values to memory comprises storing the calculatedgrid values in the first array and the second array is deleted.